Tran_Riddle_Historical Embeddings


Test

Setup

First, load in the pre-written group and word lists to be used in analyses:

The agentic and communal lists were borrowed from https://onlinelibrary.wiley.com/doi/10.1002/ejsp.2561; here are some examples:

## [1] "able"           "accomplish"     "accomplishment" "accuracy"       "accurate"      
## [6] "achieve"
## [1] "accept"        "acceptable"    "acceptance"    "accommodate"   "accommodation"
## [6] "accompany"

The group word lists were taken from https://pubmed.ncbi.nlm.nih.gov/35787033/, as well as the trait list:

## [1] "men"         "man"         "male"        "males"       "masculine"   "masculinity"
## [1] "women"      "woman"      "female"     "females"    "feminine"   "femininity"
## [1] "able"          "abrupt"        "absentminded"  "abusive"       "accommodating"
## [6] "accurate"

The role titles were scraped off this site: https://theodora.com/dot_index.html, a 1971 survey on role titles. They were merged with one-word titles from ONET, the modern equivalent: https://www.onetonline.org/find/all. They were merged with the chore list to represent unpaid labor.

## [1] "referee"     "wirer"       "stoneworker" "doweler"     "clerk"       "boiler"

Producing the mac scores

The workhorse function, which iterates over each decade, computing the MAC score between each word and each group, then finds the Pearson correlation of the resulting lists (demonstrated visually later)

An example of how the mac function works (using engall 1990); here we compute the mean average correlation of each word in the first list to the list of animals. It makes sense that the animals in the first list had the highest mac score.

##   elephant      horse      tiger      happy      weird        car 
## 0.29825993 0.26152604 0.32429946 0.01836855 0.12177856 0.09239695

You can compute the cosine similarity of any two words by replacing the lists with single words.

##     happy 
## 0.4395598

Looking at a decade

We can plot the mac scores for two different groups against each other like so:

The titles of the plots contain the Pearson coefficient, which is what we will use to measure the similarity of the two groups.

Outlier analysis

Noticing that the 1810 coha plot had an odd correlation, let’s check the proportion of gender words that were available, as this could be skewing the slope.

##            men     women     human  nonhuman year
## 1810 0.4242424 0.3714286 0.5000000 0.1111111 1810
## 1820 0.5757576 0.5714286 0.6428571 0.4444444 1820
## 1830 0.6969697 0.6571429 0.6428571 0.6111111 1830
## 1840 0.6666667 0.6857143 0.6428571 0.6111111 1840
## 1850 0.6969697 0.7142857 0.6428571 0.6666667 1850
## 1860 0.6969697 0.7142857 0.7142857 0.6666667 1860
## 1870 0.6969697 0.6857143 0.8571429 0.6111111 1870
## 1880 0.7272727 0.7142857 0.9285714 0.7222222 1880
## 1890 0.7575758 0.6857143 0.9285714 0.6111111 1890
## 1900 0.7272727 0.7428571 0.9285714 0.7222222 1900
## 1910 0.7878788 0.7428571 0.9285714 0.6111111 1910
## 1920 0.8181818 0.6857143 1.0000000 0.7222222 1920
## 1930 0.8181818 0.7142857 0.9285714 0.7222222 1930
## 1940 0.8181818 0.7142857 0.9285714 0.7222222 1940
## 1950 0.7575758 0.7142857 1.0000000 0.8333333 1950
## 1960 0.7575758 0.6857143 1.0000000 0.8888889 1960
## 1970 0.7878788 0.6571429 1.0000000 0.8888889 1970
## 1980 0.7272727 0.6857143 1.0000000 0.8333333 1980
## 1990 0.6969697 0.7142857 1.0000000 0.8888889 1990
## 2000 0.7575758 0.7142857 1.0000000 0.8333333 2000

Clearly many fewer words were available in that first decade; let’s check for statistical outliers.

##   Decade     Value
## 1   1810 0.4242424
##   Decade     Value
## 1   1810 0.3714286

Repeat for engall:

##            men     women     human  nonhuman year
## 1800 0.7272727 0.6571429 0.6428571 0.5555556 1800
## 1810 0.7575758 0.7142857 0.7142857 0.6111111 1810
## 1820 0.7878788 0.8000000 0.7142857 0.6111111 1820
## 1830 0.7878788 0.8000000 0.7142857 0.6111111 1830
## 1840 0.7878788 0.8285714 0.7142857 0.7777778 1840
## 1850 0.7878788 0.8285714 0.8571429 0.7777778 1850
## 1860 0.7878788 0.8285714 0.8571429 0.7777778 1860
## 1870 0.7878788 0.8000000 0.9285714 0.8333333 1870
## 1880 0.8484848 0.8857143 0.9285714 0.8333333 1880
## 1890 0.8484848 0.9142857 0.9285714 0.8333333 1890
## 1900 0.8484848 0.9142857 0.9285714 0.8333333 1900
## 1910 0.8787879 0.8571429 1.0000000 0.8333333 1910
## 1920 0.8484848 0.8857143 1.0000000 0.8333333 1920
## 1930 0.8484848 0.8285714 1.0000000 0.8333333 1930
## 1940 0.8484848 0.8285714 1.0000000 0.8333333 1940
## 1950 0.8787879 0.8857143 1.0000000 0.8333333 1950
## 1960 0.9090909 0.9428571 1.0000000 0.9444444 1960
## 1970 0.9393939 0.9428571 1.0000000 1.0000000 1970
## 1980 0.9696970 0.9714286 1.0000000 1.0000000 1980
## 1990 1.0000000 1.0000000 1.0000000 1.0000000 1990

## [1] Decade Value 
## <0 rows> (or 0-length row.names)
## [1] Decade Value 
## <0 rows> (or 0-length row.names)

Engall has no outliers, as expected.

Plots, engall

Now we can begin to plot the actual correlation values over time, starting with engall:

Now, let’s look at the actual magnitudes of the mac scores (rather than the Pearson correlations). To do this, we take the mean of all mac scores with a single group.

Baseline with nonhuman groups

We can also do a baseline test with different groups to see if the men/women correlations are uniquely high.

Plots, coha

Data in excel format

Access the files containing all the data, which can be filtered in excel:

head(overall_results)
##   year     value group1index group2index wordterms corpus
## 1 1800 0.7608427         men       women   agentic engall
## 2 1810 0.7966443         men       women   agentic engall
## 3 1820 0.7896832         men       women   agentic engall
## 4 1830 0.7644530         men       women   agentic engall
## 5 1840 0.7750871         men       women   agentic engall
## 6 1850 0.7651039         men       women   agentic engall
write_xlsx(overall_results, "overall_results.xlsx")

overall_results.xlsx

head(overall_results_averages)
##   year       value group1index wordterms corpus
## 1 1800 0.017776823         men   agentic engall
## 2 1810 0.013167537         men   agentic engall
## 3 1820 0.004411563         men   agentic engall
## 4 1830 0.008945463         men   agentic engall
## 5 1840 0.003948094         men   agentic engall
## 6 1850 0.011462237         men   agentic engall
write_xlsx(overall_results_averages, "overall_results_averages.xlsx")

overall_results_averages.xlsx